## Digital Communication IC Design

### Final Projects

(Due by 12/25 11:59 AM)

In the final project, we could learn more about the algorithm and architecture co-design in the applications of digital communication IC design. One group contains at most two students. Upload all your programs (C/Matlab/Python/Verilog) and reports to NTUCool. For students who use ADFP, please put all your Verilog file/Synthesis results in one folder "SubmittedFiles". TA will download it.

#### Total: 100 points

#### Please design a MIMO detector for $4 \times 4$ MIMO system using 8-PSK signals.

The algorithm that you use must belong to one of the following categories.

- Depth-first sphere decoding [1].
- Breadth-first sphere decoding or K-best algorithm or list sphere decoding
- Best-first search
  Zero-forcing, MMSE, and OSIC algorithms can NOT be used
  The 8-PSK constellation mapping is given by the following figure.



1. Please state the algorithm that you use and provide the performance of your design, namely the bit-error-rate (BER) versus signal-to-noise ratio (SNR) simulation. For comparison, you need to include the ML detector (at least, it is possible to use exhaustive search to achieve the ML detection in this final project), floating-point simulation results of your algorithm, fixed-point simulation results of your algorithm. The MIMO system can be described by the following equation,

$$y = Hx + v$$

where  $\mathbf{y}$  is the received signal of size  $4 \times 1$ ;  $\mathbf{x}$  is the transmitted signal of size  $4 \times 1$ ;  $\mathbf{v}$  denotes the noise vector of size  $4 \times 1$ . The simulations must be carried out with more than 500 randomly generated channel matrixes  $\mathbf{H}$  with i.i.d elements

described by complex Gaussian distribution with zero mean and unit variance. The SNR is defined by

$$SNR = 10 \log \left( \frac{E\{\mathbf{x}^H \mathbf{x}\}}{\sigma^2} \right)$$

where  $E\{\mathbf{v}\mathbf{v}^H\} = \sigma^2\mathbf{I}$ , which corresponds the noise variance. The SNR degradation of your fixed-point simulation and the ML detector is considered in the score. The deduction of the final project score will be calculated based on the performance degradation.

$$D = \begin{cases} 0 & degradation < 0.5dB \\ -1 & 0.5dB \le degradation < 1dB \\ -2 & 1dB \le degradation < 1.5dB \\ -3 & 1.5dB \le degradation \le 2dB \end{cases}$$

Please mark the SNR degradation in the figure by yourselves at BER=10<sup>-4</sup> using the Matlab/Python figure function of data tip. If the SNR degradation is larger than 2dB, then the judgement of the AT product is classified into the last level.

- 2. Explain your design concept or flow chart of your algorithm in the report. You don't need to implement QR decomposition. Please rely on the software to do QR composition. Thus, given that  $\mathbf{H} = \mathbf{QR}$ , the inputs to your design are the matrix  $\mathbf{R}$  and vector  $\tilde{\mathbf{y}} = \mathbf{Q}^H \mathbf{y}$ .
- 3. Consider only slow-fading in your hardware design. Namely, the matrix **R** keeps unchanged in your hardware simulation. The matrix **R** is sent into the hardware in the beginning for 3 or 4 clock cycles and stored in the registers. The I/O of the hardware is defined as follows for fair comparison of area and throughput.
  - (a) **flagChannelorData**: If it is 1, the matrix **R** is sent through the **InData** ports. If it is 0, the vector  $\tilde{\mathbf{y}}$  is sent through the **InData** ports.
  - (b) InData  $(4 \times 2 \times W_I)$ : To accommodate the highest throughput, this port is mainly used for feeding vector  $\tilde{\mathbf{y}}$  except at the beginning when the matrix  $\mathbf{R}$  could also be sent through this port. The word-length  $W_I$  is determined by yourselves. Note that each input vector  $\tilde{\mathbf{y}}$  will last for only one clock cycle.
  - (c) **OutData**  $(4 \times 2 \times W_0)$ : The detected 8-PSK symbol. Note that the demapper can be realized by software outside the hardware design.
  - (d) Clk: clock signal
  - (e) Reset: reset signal
  - (f) Ctr: Other I/O control signals that you need (For example, Valid signal for input y).
- 4. Please draw the block diagram of your hardware architecture as Fig. 5 and Fig. 6 in [1], and mark the word-lengths of your design, especially  $W_I$  and  $W_O$  and the (partial) Euclidean distance. Explain the concept and functionality of each module.

5. The timing of the test pattern should be given like this:



# Note that you can not buffer 11 input vectors $\, \tilde{y}_i \,$ and 11 detected outputs in the storage.

Although in the performance simulation, 500 channel matrixes are used. In hardware simulation, we only choose one channel matrix and the SNR corresponding to BER of  $10^{-2}$  is set to evaluate the throughput for simplicity.

Please show the hardware simulation results with the input pattern described in the above. If the number of clock cycles in the simulation is large, then cut the whole timing diagram into several segments and paste them. Also, you need to keep and show the time stamp in your simulation figure for verifying the setting of the clock period.

- (a) Please provide the timing diagram of behavior simulation and your SNR setting.
- (b) Please provide the timing diagram of the post-synthesis simulation. You need to indicate the clock period setting  $(T_s)$  in your post-synthesis simulation and the values of M, L, and M'. Explain your latency L in your design.
- (c) The processing time of one MIMO symbol detection is calculated by  $0.1MT_s$ . If your throughput is not a constant, this value is the averaged detection time. Please indicate your processing time.

- 6. Show that your implementation is consistent with your fixed-point simulation by drawing the figure indicating the errors of these ten detected symbols.
- 7. Show your synthesis report. If you use VIVADO, please use the following equation to combine all the resource consumptions into one metric [2]: NA = LUTs + DSPs × 280 + FFs

If you use ADFP, please simply show the total area after synthesis.

8. Now calculate your AT product by

Area (or NA)  $\times$  processing time.

We will classify the AT product into 4 levels and evaluate the implementation results of your AT product

(Note: 15% of the score is decided by the level of the AT product. The remaining 85% is decided by the simulations, functionality, descriptions, and timing diagram...)

9. Highlight any of your design innovations. List the working items and weightings of two members.

#### Reference:

- [1] A. Burg, M. Borgmann, M. Wenk, M. Zellweger, W. Fichtner and H. Bolcskei, "VLSI implementation of MIMO detection using the sphere decoding algorithm," in *IEEE Journal of Solid-State Circuits*, vol. 40, no. 7, pp. 1566-1577, July 2005.
- [2] J. Chen, Z. Zhang, H. Lu, J. Hu and G. E. Sobelman, "An Intra-Iterative Interference Cancellation Detector for Large-Scale MIMO Communications Based on Convex Optimization," in *IEEE Transactions on Circuits and Systems I: Regular Papers*, vol. 63, no. 11, pp. 2062-2072, Nov. 2016.